Effect of information gain on document classification using k-nearest neighbor
نویسندگان
چکیده
State universities have a library as facility to support students’ education and science, which contains various books, journals, final assignments. An intelligent system for classifying documents is needed ease visitors in higher form of service students. The that are the generally result research. Various complaints related imbalance data texts categories based on irrelevant document titles words ambiguity meaning when searching main reasons need classification system. This research uses k-Nearest Neighbor (k-NN) categorize study interests with information gain features selection handle unbalanced cosine similarity measure distance between test training data. Based results tests conducted 276 data, highest using feature 80% 20% produce an accuracy 87.5% parameter value k=5. 92.9% achieved without selection, proportion 90% 10% parameters k=5, 7, 9. paper concludes has better than because every word title considered essential role forming classification.
منابع مشابه
K-Nearest Neighbor Classification Using Anatomized Data
This paper analyzes k nearest neighbor classification with training data anonymized using anatomy. Anatomy preserves all data values, but introduces uncertainty in the mapping between identifying and sensitive values. We first study the theoretical effect of the anatomized training data on the k nearest neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian error. We then v...
متن کاملOn Nearest Neighbor Classification Using Adaptive Choice of k
Nearest neighbor classification is one of the simplest and popular methods for statistical pattern recognition. It classifies an observation x to the class, which is the most frequent in the neighborhood of x. The size of this neighborhood is usually determined by a predefined parameter k. Normally, one uses cross-validation techniques to estimate the optimum value of this parameter, and that e...
متن کاملk-Nearest Neighbor Classification on Spatial Data
Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variatio...
متن کاملDrought Monitoring and Prediction using K-Nearest Neighbor Algorithm
Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), usin...
متن کاملAn Enhancement of k-Nearest Neighbor Classification Using Genetic Algorithm
K-Nearest Neighbor Classification (kNNC) makes the classification by getting votes of the k-Nearest Neighbors. Performance of kNNC is depended largely upon the efficient selection of k-Nearest Neighbors. All the attributes describing an instance does not have same importance in selecting the nearest neighbors. In real world, influence of the different attributes on the classification keeps on c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Register: jurnal ilmiah teknologi sistem informasi
سال: 2022
ISSN: ['2502-3357', '2503-0477']
DOI: https://doi.org/10.26594/register.v8i1.2397